A  New  Foundation  For  Control-Dependence  and 
Slicing  for  Modern  Program  Structures* 


Venkatesh  Ranganath1,  Torben  Amtoft1,  Anindya  Banerjee1,  Matthew  B.  Dwyer2,  and 

John  Hatcliff1 

1  Department  of  Computing  and  Information  Sciences,  Kansas  State  University  ** *** 

2  Department  of  Computer  Science  and  Engineering,  University  of  Nebraska,  Lincoln  *  *  * 


Abstract.  The  notion  of  control  dependence  underlies  many  program  analysis 
and  transformation  techniques  used  in  numerous  applications.  Despite  wide  ap¬ 
plication,  existing  definitions  and  approaches  to  calculating  control  dependence 
are  difficult  to  apply  seamlessly  to  modern  program  structures.  Such  programs 
structures  make  substantial  use  of  exception  processing  and  increasingly  support 
reactive  systems  designed  to  run  indefinitely. 

This  paper  revisits  foundational  issues  surrounding  control  dependence  and  de¬ 
velops  definitions  and  algorithms  for  computing  control  dependence  that  can  be 
directly  applied  to  modern  program  structures.  In  the  context  of  slicing  reactive 
systems,  the  paper  proposes  a  notion  of  slicing  correctness  based  on  weak  bisim¬ 
ulation  and  proves  that  the  definition  of  control  dependence  generates  slices  that 
conform  to  this  notion  of  correctness.  Finally,  a  variety  of  properties  show  that 
the  new  definitions  conservatively  extend  classic  definitions.  These  new  defini¬ 
tions  and  algorithms  for  control  dependence  form  the  basis  of  a  publicly  available 
program  slicer  that  has  been  implemented  for  full  Java. 


1  Introduction 

The  notion  of  control-dependence  underlies  many  program  analysis  and  transformation 
techniques  used  in  numerous  applications  including  program  slicing  applied  for  program 
understanding  [1],  debugging  [2],  and  optimizations,  partial  evaluation  [3],  compiler  op¬ 
timizations  [4]  such  as  global  scheduling,  loop  fusion,  code  motion  etc.  Intuitively,  a 
program  statement  rii  is  control-dependent  on  a  statement  ?i2,  if  ri2  (typically,  a  con¬ 
ditional  statement)  controls  whether  or  not  n\  will  be  executed  or  bypassed  during  an 
execution  of  the  program. 

While  existing  definitions  and  approaches  to  calculating  control  dependence  and  slic¬ 
ing  are  widely  applied  and  have  been  used  in  the  current  form  for  well  over  20  years,  there 
are  several  aspects  of  these  definitions  that  prevent  them  from  being  applied  smoothly 
to  modern  program  structures  which  rely  significantly  on  exception  processing  and  in¬ 
creasingly  support  reactive  systems  which  are  designed  to  run  indefinitely. 

(I.)  Classic  definitions  of  control  dependence  are  stated  in  terms  of  program  control- 
flow  graphs  (CFGs)  in  which  the  CFG  has  a  unique  end  node  -  they  do  not  apply  directly 
to  program  CFGs  with  (a)  multiple  end  nodes  or  with  (b)  no  end  node.  Restriction  (a) 
means  that  existing  definitions  cannot  be  applied  directly  to  programs/methods  with 
multiple  exit  points  -  a  restriction  that  would  be  violated  by  any  method  that  raises 
exceptions  or  includes  multiple  returns.  Restriction  (b)  means  that  existing  definitions 
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cannot  be  applied  directly  to  reactive  programs  or  system  models  with  control  loops  that 
are  designed  to  run  indefinitely. 

Restriction  (a)  is  usually  addressed  by  performing  a  pre-processing  step  that  trans¬ 
forms  a  CFG  with  multiple  end  nodes  into  a  CFG  with  a  single  end  node  by  adding  a 
new  designated  end  node  to  the  CFG  and  inserting  arcs  from  all  original  exit  states  to 
the  new  end  node  [5, 1]  Restriction  (b)  can  also  be  addressed  in  a  similar  fashion  by, 
e.g.,  selecting  a  single  node  within  the  CFG  to  represent  the  end  node.  This  case  is  more 
problematic  than  the  pre-processing  for  Restriction  (a)  because  the  criteria  for  selecting 
end  nodes  that  lead  to  the  desired  control  dependence  relation  between  program  nodes 
is  often  unclear.  This  is  particularly  true  in  threads  such  as  event-handlers  which  have 
no  explicit  shut-down  methods,  but  are  “shut  down”  by  killing  the  thread  (thus,  there 
is  nothing  in  the  thread’s  control  flow  to  indicate  an  exit  point). 

(II.)  Existing  definitions  of  slicing  correctness  either  apply  to  programs  with  ter¬ 
minating  execution  traces,  or  they  often  fail  to  state  whether  or  not  the  slicing  trans¬ 
formation  preserves  the  termination  behavior  of  the  program  being  sliced.  Thus  these 
definitions  cannot  be  applied  to  reactive  programs  that  are  designed  to  execute  indefi¬ 
nitely.  Such  programs  are  used  in  numerous  modern  applications  such  as  event-processing 
modules  in  GUI  systems,  web  services,  distributed  real  time  systems  with  autonomous 
components,  e.g.  data  sensors,  etc. 

Despite  the  difficulties,  it  appears  that  researchers  and  practitioners  do  continue 
to  apply  slicing  transformations  to  programs  that  fail  to  satisfy  the  restrictions  above. 
However,  in  reality  the  pre-processing  transformations  related  to  issue  (I)  introduce  ex¬ 
tra  overhead  into  the  entire  transformation  pipeline,  clutter  up  program  transformation 
and  visualization  facilities,  necessitate  the  use/maintenance  of  mappings  from  the  trans¬ 
formed  CFGs  back  to  the  original  CFGs,  and  introduce  extraneous  structure  with  ad-hoc 
justifications  that  all  down-stream  tools/transformations  must  interpret  and  build  on  in 
a  consistent  manner.  Moreover,  regarding  issue  (II)  it  will  be  infeasible  to  continue  to 
ignore  issues  of  termination  as  slicing  is  increasingly  applied  in  high-assurance  applica¬ 
tions  such  as  reducing  models  for  verification  [6]  and  for  reasoning  about  security  issues 
where  it  is  crucial  that  liveness/non-ternrination  properties  be  preserved. 

Working  on  a  larger  project  on  slicing  concurrent  Java  programs,  we  have  found 
it  necessary  to  revisit  basic  issues  surrounding  control  dependence  and  have  sought  to 
develop  definitions  that  can  be  directly  applied  to  modern  program  structures  such  as 
those  found  in  reactive  systems.  In  this  paper,  we  propose  and  justify  the  usefulness  and 
correctness  of  simple  definitions  of  control  dependence  that  overcome  the  problematic 
aspects  of  the  classic  definitions  described  above.  The  specific  contributions  of  this  paper 
are  as  follows. 

—  We  propose  new  definitions  of  control  dependence  that  are  simple  to  state  and  easy  to 
calculate  and  that  work  directly  on  control-flow  graphs  that  may  have  no  end  nodes 
or  non-unique  end  nodes,  thus  avoiding  troublesome  pre-processing  CFG  transfor¬ 
mations  (Section  4). 

—  We  prove  that  these  definitions  applied  to  reducible  CFGs  yield  slices  that  are  cor¬ 
rect  according  to  generalized  notions  of  slicing  correctness  based  on  a  form  of  weak- 
bisimulation  that  is  appropriate  for  programs  with  infinite  execution  traces  (Sec¬ 
tion  5.1). 

—  We  clarify  the  relationship  between  our  new  definitions  and  classic  definitions  by 
showing  that  our  new  definitions  represent  a  form  of  “conservative  extension”  of 
classic  definitions:  when  our  new  definitions  are  applied  to  CFGs  that  conform  to 
the  restriction  of  a  single  end  node,  our  definitions  correspond  to  classic  definitions  - 
they  do  not  introduce  any  additional  dependences  nor  do  they  omit  any  dependences. 
(Section  5.1). 


—  We  discuss  the  intuitions  behind  algorithms  for  computing  control  dependence  (ac¬ 
cording  to  the  new  definitions)  to  justify  that  control  dependence  is  computable  in 
polynomial  time  (Section  6). 

Expanded  discussions,  definitions  and  full  proofs  appear  in  the  companion  technical 
report  [7]  which  can  be  found  on  the  project  web  site  [8]. 

The  proposed  notions  of  control  dependence  described  in  this  paper  have  been  imple¬ 
mented  in  Indus-Kaveri  [8]  -  our  publicly  available  open-source  Eclipse-based  Java  sheer 
that  works  on  full  Java  1.4  and  has  been  applied  to  code  bases  of  up  to  10,000  lines  of 
Java  application  code  (<  80K  bytecodes)  excluding  library  code.  Besides  its  application 
as  a  stand-alone  program  visualization,  debugging,  and  code  transformation  tool,  our 
sheer  is  being  used  in  the  next  generation  of  our  Bandera  tool  set  for  model-checking 
concurrent  Java  systems. 

2  Basic  Definitions 

2.1  Control  Flow  Graphs 

When  dealing  with  foundational  issues  of  control  dependence,  researchers  often  cast  their 
work  in  terms  of  a  simple  imperative  language  phrased  in  terms  of  control  flow  graphs. 
We  follow  that  practice  here  and  base  our  presentation  on  a  definition  of  control-flow 
graph  adapted  from  Ball  and  Horwitz  [9]. 

Definition  1  (Control  Flow  Graphs). 

A  control-flow  graph  G  =  (TV,  E,  no)  is  a  labeled  directed  graph  in  which 

—  N  is  a  set  of  nodes  that  represent  commands  in  program, 

—  the  set  of  N  is  partitioned  into  two  subsets  Ns ,  Np,  where  Ns  are  statement  nodes 
with  each  ns  £  Ns  having  at  most  one  successor,  where  Np  are  predicate  nodes  with 
each  np  £  Np  having  two  successors,  and  NE  C  Ns  contains  all  nodes  of  Ns  that 
have  no  successors,  i.e.,  NE  contains  all  end  nodes  of  G, 

—  E  is  a  set  of  labeled  edges  that  represent  the  control  flow  between  graph  nodes  where 
each  np  £  Np  has  two  outgoing  edges  labeled  T  and  F  respectively,  and  each  ns  £ 
( Ns  —  Ne)  has  an  outgoing  edge  labeled  A  (representing  Always  taken), 

—  the  start  node  no  has  no  incoming  edges  and  all  nodes  in  N  are  reachable  from  no. 

We  will  display  the  labels  on  CFG  edges  only  when  necessary  for  the  current  exposition. 

As  stated  earlier,  existing  presentations  of  slicing  require  that  each  CFG  G  satisfies 
the  unique  end  node  property:  there  is  exactly  one  element  in  NE  =  {ne}  and  ne  is 
reachable  from  all  other  nodes  of  G.  The  definition  above  does  not  require  this  property 
of  CFGs,  but  we  will  sometimes  consider  CFGs  with  the  unique  end  node  property  in 
our  comparisons  to  previous  work. 

To  relate  a  CFG  with  the  program  that  it  represents,  we  use  the  function  code  to 
map  a  CFG  node  n  to  the  code  for  the  program  statement  that  corresponds  to  that 
node.  Specifically,  for  ns  £  Ns ,  code(ns)  yields  the  code  for  an  assignment  statement, 
and  for  np  £  Np ,  code(np)  the  code  for  the  test  of  a  conditional  statement  (the  labels 
on  the  edges  for  np  allow  one  to  determine  the  nodes  for  the  true  and  false  branches  of 
the  conditional).  The  function  def  maps  each  node  to  the  set  of  variables  defined  (i.e., 
assigned  to)  at  that  node  (always  a  singleton  or  empty  set),  and  ref  maps  each  node  to 
the  set  of  variables  referenced  at  that  node. 

A  CFG  path  n  from  m  to  n*  is  a  sequence  of  nodes  n,,  nj+i, . . . ,  rife  such  for  every 
consecutive  pair  of  nodes  ( nj ,  nj+ 1)  in  the  path  there  is  an  edge  from  nj  to  nj+ 1.  A  path 
between  nodes  n,  and  rik  can  also  be  denoted  as  [n,..n/j].  When  the  meaning  is  clear  from 
the  context,  we  will  use  i r  to  denote  the  set  of  nodes  contained  in  7r  and  we  write  n  £  tt 


when  n  occurs  in  the  sequence  i r.  Path  7 r  is  non-trivial  if  it  contains  at  least  two  nodes. 
A  path  is  maximal  if  it  is  infinite  or  if  it  terminates  in  an  end  node. 

The  following  definitions  describe  relationships  between  graph  nodes  and  the  distin¬ 
guished  start  and  end  nodes  [10].  Node  n  dominates  node  m  in  G  (written  dom(n,m))  if 
every  path  from  the  start  node  s  to  m  passes  through  n  (note  that  this  makes  the  dom¬ 
inates  relation  reflexive).  Node  n  post- dominates  node  m  in  G  (written  post-dom(n,  m)) 
if  every  path  from  node  m  to  the  end  node  e  passes  through  n.  Node  n  strictly  post- 
dominates  node  m  in  G  if  post-dom(n,  to)  and  n  7^  to.  Node  n  is  the  immediate  post- 
dominator  of  node  m  if  n  7^  to  and  n  is  the  first  post-dominator  on  every  path  from  to 
to  the  end  node  e.  Node  n  strongly  post- dominates  node  to  in  G  if  n  post-dominates  to 
and  there  is  an  integer  k  >  1  such  that  every  path  from  node  to  of  length  >  k  passes 
through  n  [1].  The  difference  between  strong  post-domination  and  the  simple  definition 
of  post-domination  above  is  that  even  though  node  n  occurs  on  every  path  from  to  to 
e  (and  thus  n  post-dominates  to),  it  may  be  the  case  that  there  is  a  loop  in  the  CFG 
between  to  and  n  that  admits  an  infinite  path  beginning  at  to  that  never  encounters  n. 
Strong  post-domination  rules  out  the  possibility  of  such  loops  between  to  and  n  -  thus,  it 
is  sensitive  to  the  possibility  of  non-termination  along  paths  from  m  to  n.  Note  that  dom¬ 
ination  relations  are  well-defined  but  post-domination  relationships  are  not  well-defined 
for  graphs  that  do  not  have  the  unique  end  node  property. 

A  CFG  G  is  reducible  if  E  can  be  partitioned  into  disjoint  sets  Ef  (the  forward  edge 
set)  and  Et  (the  back  edge  set)  such  that  (A,  Ef)  forms  a  DAG  in  which  each  node  can 
be  reached  from  the  entry  node  no  and  for  all  edges  e  £  Eb,  the  target  of  e  dominates 
the  source  of  e.  All  “well-structured”  programs  give  rise  to  reducible  control-flow  graphs, 
including  Java  programs.  Our  definitions  and  most  of  our  correctness  results  apply  to 
irreducible  CFGs  as  well,  but  our  bi-simulation-based  correctness  of  slicing  result  only 
holds  for  reducible  graphs  since  bi-simulation  requires  ordering  properties  that  can  only 
be  guaranteed  on  reducible  graphs. 

2.2  Program  Execution 

The  execution  semantics  of  program  CFGs  is  phrased  in  terms  of  transitions  on  program 
states  (n,  a)  where  n  is  a  CFG  node  and  cr  is  a  store  mapping  the  corresponding  program’s 
variables  to  values.  A  series  of  transitions  gives  an  execution  trace  through  p’s  statement- 
level  control  flow  graph.  It  is  important  to  note  that  when  execution  is  in  state  (ni,ai), 
the  code  at  node  ni  has  not  yet  been  executed.  Intuitively,  the  code  at  n-i  is  executed 
on  the  transition  from  ( )  to  successor  state  (74+1,  cq+i).  Execution  begins  at  the 
state  node  no,  and  the  execution  of  each  node  possibly  updates  the  store  and  transfers 
control  to  an  appropriate  successor  node.  Execution  of  a  node  ne  £  NE  produces  a 
final  state  (halt,  cr)  where  the  control  point  is  indicated  by  a  special  label  halt  -  this 
indicates  a  normal  termination  of  program  execution.  The  presentation  of  slicing  in  the 
next  section  involves  arbitrary  finite  and  infinite  non-empty  sequences  of  states  written 
II  =  si ,  S2,  ....  For  a  set  of  variables  V,  we  write  eri  a2  when  for  all  x  £  V , 
ai(x)  =  a2(x). 

2.3  Notions  of  Dependence  and  Slicing 

A  program  slice  consists  of  the  parts  of  a  program  p  that  (potentially)  affect  the  variable 
values  that  are  referenced  at  some  program  points  of  interest  [11].  Traditionally,  the 
program  “points  of  interest”  are  called  the  slicing  criterion.  A  slicing  criterion  C  for  a 
program  p  is  a  non-empty  set  of  nodes  {n\,  . . . ,  n^}  where  each  Hi  is  a  node  in  p’s  CFG. 

The  definitions  below  recall  the  two  basic  notions  of  dependence  that  appear  in  slicing 
of  sequential  programs:  data  dependence  and  control  dependence  [11]. 


Data  dependence  captures  the  notion  that  a  variable  reference  is  dependent  upon  any 
variable  definition  that  “reaches”  the  reference. 

Definition  2  (data  dependence).  Node  n  is  data-dependent  on  m  (written  m  — >  n 
-  the  arrow  pointing  in  the  direction  of  data  flow)  if  there  is  a  variable  v  such  that 

1.  there  exists  a  non-trivial  path  7r  in  p’s  CFG  from  m  to  n  such  that  for  every  node 
m'  £  7r  —  {m,n},  v  £  deflm'),  and 

2.  v  £  deflm)  n  ref(n). 

Control  dependence  information  identifies  the  conditionals  that  may  affect  execution 
of  a  node  in  the  slice.  Intuitively,  node  n  is  control-dependent  on  a  predicate  node  m  if 
m  directly  determines  whether  n  is  executed  or  “bypassed” . 

Definition  3  (control  dependence).  Node  n  is  control-dependent  on  rn  in  program 
p  (written  m  —>  n)  if 

1.  there  exists  a  non-trivial  path  tt  from  in  to  n  in  p’s  CFG  such  that  every  node 
m'  £  tt  —  {m,  n}  is  post-dominated  by  n,  and 

2.  m  is  not  strictly  post-dominated  by  n. 

For  a  node  n  to  be  control-dependent  on  predicate  m,  there  must  be  two  paths  that 
connect  in  with  the  unique  end  node  e  such  that  one  contains  n  and  the  other  does 
not.  There  are  several  slightly  different  notions  of  control-dependence  appearing  in  the 
literature,  and  we  will  consider  several  of  these  variants  and  relations  between  them  in 
the  rest  of  the  paper.  At  present,  we  simply  note  that  the  above  definition  is  standard 
and  widely  used  (e.g.,  see  [10]). 

We  write  in  -i  n  when  either  m  ^  n  or  m  ^  n.  Constructing  a  program  slice 
proceeds  by  finding  the  set  of  CFG  nodes  Sc  (called  the  slice  set)  from  which  the  nodes 

in  C  are  reachable  via 

Definition  4  (slice  set).  Let  C  be  a  slicing  criterion  for  program  p.  Then  the  slice  set 
Sc  of  p  with  respect  to  C  is  defined  as  follows: 

d 

Sc  =  {m  |  3n  .n  £  C  and  m  — >*  n}. 

The  notion  of  slicing  described  above  is  referred  to  as  “backward  static  slicing”  be¬ 
cause  the  algorithm  starts  at  the  criterion  nodes  and  looks  backward  through  the  pro¬ 
gram’s  control-flow  graph  to  find  other  program  statements  that  influence  the  execution 
at  the  criterion  nodes.  In  this  paper  we  consider  only  backward  slices,  but  our  definitions 
of  control  dependence  can  also  be  applied  we  computing  forward  slices. 

In  many  cases  in  the  slicing  literature,  the  desired  correspondence  between  the  source 
program  and  the  slice  is  not  formalized  because  the  emphasis  is  often  on  applications 
rather  than  foundations,  and  this  also  leads  to  subtle  differences  between  presentations. 
When  a  notion  of  “correct  slice”  is  given,  it  is  often  stated  using  the  notion  of  projection 
[12].  Informally,  given  an  arbitrary  trace  II  of  p  and  an  analogous  trace  77 s  of  ps,  ps  is  a 
correct  slice  of  p  if  projecting  out  the  nodes  in  criterion  C  (and  the  variables  referenced 
at  those  nodes)  for  both  77  and  77s  yields  identical  state  sequences.  We  will  consider 
slicing  correctness  requirements  in  greater  detail  in  Section  5.1. 

3  Assessment  of  Existing  Definitions 

3.1  Variations  in  Existing  Control  Dependence  Definitions 

Although  the  definition  of  control  dependence  that  we  stated  in  Section  2  is  widely  used, 
there  are  a  number  of  (sometimes  subtle)  variations  appearing  in  the  literature.  One 


dimension  of  variation  is  whether  the  particular  definition  captures  only  direct  control 
dependence  or  also  admits  indirect  control  dependences.  For  example,  using  the  definition 
of  control  dependence  in  Definition  3,  for  Fig.  1  (a),  we  can  conclude  that  a  — >  /  and 
f  —>  g  however  a  g  does  not  hold  because  g  does  not  post-dominate  /.  The  fact  that  a 
and  g  are  indirectly  related  (a  does  play  a  role  in  determining  if  g  is  executed  or  bypassed) 
is  not  captured  in  the  definition  of  control  dependence  itself  but  in  the  transitive  closure 
used  in  the  slice  set  construction  (Definition  4).  However,  some  definitions  of  control 
dependence  [1]  incorporate  this  notion  of  transitivity  directly  into  the  definition  itself  as 
we  will  illustrate  later. 

Another  dimension  of  variation  is  whether  the  particular  definition  is  sensitive  to 
non-termination  or  not.  Consider  Fig.  1  (a)  where  node  c  represents  a  post-test  that 
controls  a  loop  which  may  be  infinite  (one  cannot  tell  by  simply  looking  at  the  CFG). 

According  to  Definition  3 ,  a  ^  d  but  c  —>  d  does  not  hold  (because  d  post-dominates 
c)  even  though  c  may  determine  whether  d  executes  or  never  gets  to  execute  due  to  an 
infinite  loop  that  postpones  d  forever.  Thus,  Definition  3  is  non-termination  insensitive. 

We  now  further  illustrate  these  dimensions  by  recalling  definitions  of  strong  and  weak 
control  dependence  given  by  Podgurski  and  Clarke  [1]  and  used  in  a  number  of  works 
including  the  study  of  control  dependence  by  Bilardi  and  Pingali  [13]. 

Definition  5  (Podgurski-Clarke  Control  Dependence). 

—  ri2  is  strongly  control  dependent  on  n\  (n\  — ►  U2)  if  there  is  a  path  from  n\  to  ri2 

that  does  not  contain  the  immediate  post  dominator  of  n± . 

—  ri2  is  weakly  control  dependent  on  n\  (n\PC if  n2  strongly  post  dominates  n[, 
a  successor  of  n\ ,  but  does  not  strongly  post  dominate  n'{,  another  successor  of  n\ . 

The  notion  of  strong  control  dependence  above  roughly  corresponds  to  Definition  3, 
but  it  captures  indirect  control  dependence  whereas  Definition  3  captures  only  direct 

PC —scd 

control  dependence.  For  example,  in  Fig.  1,  in  contrast  to  Definition  3  we  have  a  — >  g 
because  there  is  a  path  afg  which  does  not  contain  the  immediate  post-dominator  of  a. 
However,  one  can  show  that  when  used  in  the  context  of  Definition  4  (which  computes 
the  transitive  closure  of  dependences),  the  two  definitions  give  rise  to  the  same  slices. 
The  notion  of  weak  control  dependence  above  subsumes  the  notion  of  strong  control 

PC— scd,  PC —wed 

dependence  (rq  — >  ri2  implies  n\  — >*  712)  and  it  captures  weaker  dependences 
between  nodes  induced  by  non-termination,  that  is,  it  is  non-termination  sensitive.  Note 
that  for  Fig.  1  (a),  cPC—flcdd  because  d  does  not  strongly  post-dominate  b:  the  presence 
of  the  loop  controlled  by  c  guarantees  that  there  does  not  exist  a  k  such  that  every  path 
from  node  b  of  length  >  k  passes  through  d. 

In  assessing  the  above  variants  of  control  dependence  in  the  context  of  program 
slicing,  it  is  important  to  note  that  slicing  based  on  Definition  3  or  the  strong  control 
dependence  above  can  transform  a  non-terminating  program  into  a  terminating  one 
(i.e.,  non-termination  is  not  preserved  in  the  slice).  In  Fig.  1  (a),  assume  that  the  loop 
controlled  by  c  is  an  infinite  loop.  Using  the  slice  criterion  C  =  {d}  would  include  a 
but  not  b  and  c  (we  assume  no  data  dependence  between  d  and  b  or  c)  if  the  slicing  is 
based  on  strong  control  dependence.  Thus,  in  the  sliced  program,  one  would  be  able  to 
observe  an  execution  of  d ,  but  such  an  observation  is  not  possible  in  the  original  program 
because  execution  diverges  before  d  is  reached.  In  contrast,  the  difference  between  direct 
and  indirect  statements  of  control  dependence  seem  to  largely  technical  stylistic  decision 
in  how  the  definitions  are  stated. 

Very  few  works  consider  the  non-termination  sensitive  notion  of  weak  control  depen¬ 
dence  above.  We  conjecture  that  there  are  at  least  two  reasons  for  this.  First,  although 
it  bears  the  qualifier  “weak” ,  weak  control  dependence  is  actually  a  stronger  relation 


(a)  (b)  (c) 

Fig.  1.  (a)  is  a  simple  CFG.  (b)  illustrates  how  a  CFG  that  does  not  have  a  unique  exit  node 
reachable  from  all  nodes  can  be  augmented  to  have  unique  exit  node  reachable  from  all  nodes, 
(c)  is  a  CFG  with  multiple  control  sinks  of  different  sorts. 

(relating  more  nodes)  and  will  thus  include  more  nodes  in  the  slice.  Second,  many  appli¬ 
cations  of  slicing  focus  on  debugging  and  program  visualization  and  understanding,  and 
in  these  applications  having  slices  that  preserve  non-termination  is  less  important  than 
having  smaller  slices.  However,  slicing  is  increasingly  used  in  security  applications  and 
as  a  model-reduction  technique  for  software  model  checking.  In  these  applications,  it  is 
quite  important  to  consider  variants  of  control  dependence  that  preserve  non-termination 
properties  since  failure  to  do  so  could  allow  inferences  to  be  made  that  compromise  se¬ 
curity  policies,  for  instance  invalidate  checks  of  liveness  properties  [6]. 

3.2  Unique  End  node  restriction  on  CFG 

All  definitions  of  control  dependences  that  we  are  aware  of  require  that  CFGs  satisfy  the 
unique  end  node  requirement  -  but  many  software  systems  fail  to  satisfy  this  property. 
Existing  works  simply  require  that  CFGs  have  this  property,  or  they  suggest  that  CFGs 
can  be  augmented  to  achieve  this  property,  e.g.,  using  the  following  steps:  (1)  insert  a 
new  node  e  into  the  CFG,  (2)  add  an  edge  from  each  exit  node  (other  than  e)  to  e,  (3) 
pick  an  arbitrary  node  n  in  each  non-terminating  loop  and  add  an  edge  from  n  to  e. 
In  our  experience,  such  augmentations  complicate  the  system  being  analyzed  in  several 
ways.  If  the  augmentation  is  non-destructive,  a  new  CFG  is  generated  which  costs  time 
and  memory.  If  the  augmentation  is  destructive,  this  may  clash  with  the  requirements 
of  other  clients  of  the  CFG,  thus  necessitating  the  reversal  of  the  augmentation  before 
subsequent  analyses  can  proceed.  In  addition,  having  multiple  end  nodes  (e.g.,  an  ex¬ 
ceptional  exit  and  a  regular  return)  flow  into  a  single  new  end  node  causes  semantically 
different  information  to  flow  together. 

Many  systems  have  threads  where  the  main  control  loop  has  no  exit  -  the  loop  is 
“exited”  by  simply  killing  the  thread.  For  example,  in  Xt  library,  most  applications 
create  widgets,  register  callbacks,  and  call  XtAppMainLoopO  to  enter  an  infinite  loop 
that  manages  the  dispatching  of  events  to  the  widgets  in  the  application.  In  PalmOS, 
applications  are  designed  such  that  they  start  upon  receiving  a  start  code,  execute  a  loop, 
and  terminate  upon  receiving  a  stop  code.  However,  the  application  may  choose  to  ignore 
the  stop  code  once  it  starts,  and  hence,  not  terminate  except  when  it  is  explicitly  killed. 
In  such  cases,  a  node  in  the  loop  must  be  picked  as  the  loop  exit  node  for  the  purpose  of 
augmenting  the  CFG.  However,  this  can  disrupt  the  control  dependence  calculations.  In 
Fig.  1  (b),  we  would  intuitively  expect  e,&,c,  and  d  to  be  control  dependent  on  a  in  the 

unaugmented  CFG.  However,  aPC—>L'cd{e,b,c}  and  cPC—^c<l{b,c,d,f}  in  the  augmented 
CFG.  It  is  trivial  to  prune  dependences  involving  /.  However,  there  are  new  dependences 

c  — >  {b,c,d}  which  did  not  exist  in  the  unaugmented  CFG.  Although  a  suggestion 

to  delete  any  dependence  on  c  may  work  for  the  given  CFG,  it  fails  if  there  exists  a 
node  g  that  is  a  successor  of  c  and  a  predecessor  of  d.  Also,  aPC—^'cdd  exists  in  the 


unaugmented  CFG  but  not  in  the  augmented  CFG,  and  it  is  not  obvious  how  to  recover 
this  information. 

We  address  these  issues  head-on  by  considering  alternate  definitions  of  control-dependence 
that  do  not  impose  the  unique  end-node  description. 


4  New  Dependence  Definitions 

In  previous  definitions,  a  control  dependence  relationship  where  rij  is  dependent  on  n, 
is  specified  by  considering  paths  from  n,  and  rij  to  a  unique  CFG  end  node  -  essentially 
rii  and  the  end  node  delimit  the  path  segments  that  are  considered.  Since  we  aim  for 
definitions  that  apply  when  CFGs  do  not  have  an  end  node  or  have  more  than  one  end 
node,  we  aim  to  instead  specify  that  rij  is  control  dependent  on  m  by  focusing  on  paths 
between  nt  and  rij.  Specifically,  we  focus  on  path  segments  that  are  delimited  by  nt  at 
both  ends  -  intuitively  corresponding  to  the  situation  in  a  reactive  program  where  instead 
of  reaching  an  end  node,  a  program’s  behavior  begins  to  repeat  itself  by  returning  again 
to  rii.  At  a  high  level,  the  intuition  remains  the  same  as  in,  e.g.,  Definition  3  -  executing 
one  branch  of  ni  always  leads  to  nj ,  whereas  executing  another  branch  of  rij  can  cause  nj 
to  be  bypassed.  The  additional  constraints  that  are  added  (e.g.,  nj  always  occurs  before 
any  occurrence  of  n()  limits  the  region  in  which  rij  is  seen  or  bypassed  to  segments  leading 
up  to  the  next  occurrence  of  Hi  -  ensuring  that  ru  is  indeed  controlling  nj .  The  definition 
below  considers  maximal  paths  (which  includes  infinite  paths)  and  thus  is  sensitive  to 
non-termination. 

Definition  6  ( n,  "T)'/  n; ) .  In  a  CFG,  nj  is  (directly)  non-termination  sensitive 
control  dependent  on  node  n *  if  ni  has  at  least  two  successors,  nu  and  ni , 

(1)  for  all  maximal  paths  from  nk,  nj  always  occurs  and  it  occurs  before  any  occurrence 
of  ni .  occurrence  of  n i 

(2)  there  exists  a  maximal  path  from  ni  on  which  either  nj  does  not  occur,  or  nj  is  strictly 
preceded  by  n *. 

We  supplement  a  traditional  presentation  of  dependence  definitions  with  definitions  given 
as  formulae  in  computation  tree  logic  (CTL)  [14].  CTL  is  a  logic  for  describing  the 
structure  of  sets  of  paths  in  a  graph,  making  it  a  natural  language  for  expressing  control 
dependences.  Informally,  CTL  includes  two  path  quantifiers,  E  and  A,  which  define  that 
a  path  from  a  given  node  with  a  given  structure  exists  or  that  all  paths  from  that  node 
have  the  given  structure.  The  structure  of  a  path  is  defined  using  one  of  five  modal 
operators  (we  refer  to  a  node  satisfying  (f>  as  a  0-node):  X0  states  that  the  successor 
node  is  a  0-node.  F0  states  the  existence  of  a  0-node,  G 0  states  that  a  path  consists 
entirely  of  0-nodes,  0U0  states  the  existence  of  a  0-node  and  that  the  path  leading 
up  to  that  node  consists  of  0-nodes,  finally,  the  0W0  operator  is  a  variation  on  U  that 
relaxes  the  requirement  that  a  0-node  exist.  In  a  CTL  formula  path  quantifiers  and 
modal  operators  occur  in  pairs,  e.g.,  AF0  says  on  all  paths  from  a  node  a  0  node  occurs. 
A  formal  definition  of  CTL  can  be  found  in  [14]. 

The  following  CTL  formula  captures  the  definition  of  control  dependence  above. 

mnt-¥dnj  =  ( G ,  m)  b  EX(A[->rijUnj])  A  EX(E[^nJW(-ni  A  m)}). 

Here,  (G,  n()  \=  expresses  the  fact  that  the  CTL  formula  is  checked  against  the  graph 
G  at  node  rij.  The  two  conjuncts  are  essentially  a  direct  transliteration  of  the  natural 
language  above. 

We  have  formulated  the  definition  above  to  apply  to  execution  traces  instead  of 
CFG  paths.  In  this  setting  one  needs  to  bound  relevant  segments  by  m  as  discussed 
above.  However,  when  working  on  CFG  paths,  the  definition  conditions  can  actually  be 


simplified  to  read  as  follows:  (1)  for  all  maximal  paths  from  rik,  nj  always  occurs ,  and 
(2)  there  exists  a  maximal  path  from  m  on  which  nj  does  not  occur.  The  corresponding 
CTL  formula  is 

nT^n-j  =  (G,  ni)  |=  EX(AF(rij)  A  EX(EG(-ni))- 

See  [7]  for  the  proof  that  these  two  definitions  are  equivalent  on  CFGs. 

To  see  that  this  definition  is  non-termination  sensitive,  note  that  cnt—>dd  in  Fig.  1  (a) 
since  there  exists  a  maximal  path  (an  infinite  loop  between  b  and  c)  where  d  never  occurs. 
Moreover,  the  definition  corresponds  to  our  intuition  in  Section  3.2  in  that,  in  Fig.  1  (b 

unaugmented)  a"f££de  because  there  is  an  infinite  loop  through  b,  c,  d  and  ant-^d{b,c,d} 
because  there  is  maximal  path  ending  in  e  that  does  not  contain  b,  c,  or  d.  In  Fig.  1  (c), 

note  that  dnt—>d'i  because  there  is  an  infinite  path  from  j  (cycle  on  j,d)  on  which  i  does 
not  occur. 

We  now  turn  to  constructing  a  non-termination  insensitive  version  of  control  depen¬ 
dence.  The  definition  above  considered  all  paths  leading  out  of  a  conditional.  Now,  we 
need  to  limit  the  reasoning  to  finite  paths  that  reach  a  terminal  region  of  the  graph.  To 
handle  this  in  the  context  of  CFGs  that  do  not  have  the  unique  end-node  property,  we 
generalize  the  concept  of  end  node  to  control  sink  -  a  set  of  nodes  such  that  each  node 
in  the  set  is  reachable  from  every  other  node  in  the  set  and  there  is  no  path  leading  out 
of  the  set.  More  precisely,  a  control  sink  k  is  a  set  of  CFG  nodes  that  form  a  strongly 
connected  component  such  that  for  each  n  G  n  each  successor  of  n  is  also  in  n.  It  is  trivial 
to  see  that  each  end  node  forms  a  control  sink  and  each  loop  without  any  exit  edges  in 
the  graph  forms  a  control  sink.  For  example,  {e}  and  {b,  c,  d}  are  control  sinks  in  Fig.  1 
(b  unaugmented),  and  {e}  and  {d, i,j}  are  control  sinks  in  Fig.  1  (c).  c-sink  denotes  a 
set- valued  function  on  nodes  such  that  c-sink(n)  =  S  where  if  n  belongs  to  a  control 
sink  then  S  is  set  of  nodes  representing  that  sink,  otherwise  5  =  0. 

Existing  definitions  of  non-termination  insensitive  control  dependence  rely  on  rea¬ 
soning  about  paths  from  the  conditional  to  the  end  node.  We  generalize  this  to  reason 
about  paths  from  a  conditional  to  control  sinks.  The  set  of  sink-bounded  paths  from  n k 
(denoted  SinkPaths(nk))  contains  all  n  such  that  ir  is  a  path  from  rifc  to  a  node  ns  such 
that  ns  belongs  to  a  control  sink. 

Definition  7  ( n,  ntlfd  nj).  In  a  CFG,  nj  is  (directly)  non-termination  insensi¬ 
tively  control  dependent  on  n\  if  nt  has  at  least  2  successors,  nu  and  ni , 

(1)  for  all  paths  7r  G  SinkPaths(nk) ,  nj  €  7 r. 

(2)  there  exists  a  path  7r  G  SinkP aths(ni)  such  that  nj  n  and  if  n  leads  to  a  control 
sink  k,  nj  ^  k. 

This  definition  is  expressed  in  CTL  as 

m  nticd  ^  ^  |_  EX(AF(nj))  A  EX(E[-iUjU(c-smfc?  A  nj  fL  c-sink)]) 

where  A  and  E  represent  quantification  over  sink-bounded  paths  only,  c-sink?  evaluates 
to  true  only  if  the  current  node  belongs  to  a  control  sink  and  c-sink  returns  the  sink  set 
associated  with  the  current  node. 

To  see  that  this  definition  is  non-termination  insensitive,  note  that  c  "I—*  d  in  Fig.  1 
(a)  since  there  does  exists  path  from  b  to  a  control  sink  ({e}  is  the  only  control  sink) 

that  does  not  contain  d.  Again,  in  Fig.  1  (b  unaugmented)  a  nt^d  e  because  there  path 
from  b  to  the  control  sink  { b ,  c,  d}  and  neither  the  path  nor  the  sink  contain  e,  and 

a  nt2fd  Cj  d}  because  there  is  path  ending  in  control  sink  {e}  that  does  not  contain  b , 
c,  or  d.  It  is  interesting  to  note  that  in  Fig.  1  (c),  our  definition  concludes  that  d  n/^d  i 


because  although  there  is  a  trivial  path  from  d  to  the  control  sink  {d,  i,j},  i  belongs  to 
that  control  sink.  This  is  because  our  definition  inherently  captures  a  form  of  fairness 
-  since  the  backedge  from  j  guarantees  that  d  will  be  executed  an  infinite  number  of 
times,  the  only  way  to  avoid  executing  i  would  be  to  branch  to  d  on  every  cycle.  The 
consequence  of  this  property  is  that  even  though  there  may  be  control  structures  inside 
of  a  control  sink,  they  will  not  give  rise  to  any  control  dependences.  In  applications  where 
one  desires  to  detect  such  dependences,  one  would  apply  the  definition  to  control  sinks 
in  isolation  with  back  edges  removed. 


4.1  Properties  of  the  Dependence  Relations 

We  begin  by  showing  that  the  new  definitions  of  control  dependence  conservatively  extend 
classic  definitions:  when  we  consider  our  definitions  in  the  original  setting  with  CFGs 
with  unique  end  nodes,  the  definitions  coincide  with  the  classic  definitions.  In  addition, 
direct  non-termination  insensitive  control  dependence  (Definition  7)  implies  the  transitive 
closure  of  direct  non-termination  sensitive  control  dependence. 


Theorem  1  (Coincidence  Properties).  For  all  CFGs  with  with  the  unique  end  node 
property,  and  for  all  nodes  Ui,Uj  G  N, 


/ .  %  /  7  cd  7.  nticd 

( 1 )  rii  rij  and  n{  — ►  nj  implies  Ui  — >  nj 

\  nticd  .  , .  cd 

(2)  ni  — ►  nj  implies  n*  — >  rij 

PC— wed  .  nr  ntsed 


(3)  m 


nj  iff  ni  ->•  rij 


(4)  For  all  CFGs,  for  all  nodes  ni,nj  G  N  :  n,  n1dfd  n.  implies  n,  ntffd  n ^ 


For  the  correctness  (bisimulation-based)  proof  in  Section  5.1,  we  shall  need  a  few 
results  about  slice  sets  (members  of  which  are  “observable”).  A  crucial  property  is  that 
the  first  observable  on  any  path  will  be  encountered  sooner  or  later  on  all  other  paths: 

Lemma  1.  Assume  the  node  set  B  is  closed  under  termination  sensitive  control  depen¬ 
dency,  and  that  no  £  B.  Assume  that  there  is  a  path  n  from  no  to  n\,  with  n\  £  B  but 
for  all  n  €  n  with  n  /  n\,  n  £  B.  Then  all  maximal  paths  from  no  will  contain  n\. 

Proof.  Assume,  in  order  to  arrive  at  a  contradiction,  that  there  exists  a  maximal  path 
from  no  that  does  not  contain  m.  We  define  a  predicate  Q ,  such  that  Q{n)  holds  iff  there 
exists  a  maximal  path  from  n  that  does  not  contain  n\.  By  our  assumption,  Q(no)  holds; 
clearly,  Q(n\)  does  not  hold.  Therefore,  ir  can  be  written  as  [no-.n2n3..ni]  where  <3(712) 
holds  but  <3(713)  does  not  hold  (that  is,  there  is  an  edge  from  n2  to  n 3;  note  that  n2  may 
equal  no  and  that  no  may  equal  m  but  we  know  that  ni  7^  712)- 

ntsed  ntsed 

We  shall  show  that  n2  — >  n\\  then  from  n\  G  a  we  from  a  being  closed  under  — > 
get  n2  G  B  which  contradicts  n\  being  the  only  node  in  7 r  which  is  also  in  B. 

Note  that  since  (3(712)  holds,  there  exists  a  maximal  path  starting  at  712  not  containing 
ni;  that  path  has  to  have  at  least  two  elements  (since  712  has  an  outgoing  edge)  and  the 
second  element  cannot  be  713  (as  <3(713)  does  not  hold).  Therefore,  the  second  element  is 
some  node  714  with  713  7^  714,  and  there  exists  a  maximal  path  from  714  which  does  not 
contain  n\.  Our  final  obligation  is  to  prove  that  all  maximal  paths  from  713  contain  ni, 
which  follows  since  <3(713)  does  not  hold. 

In  a  similar  way  we  can  show: 

Lemma  2.  Assume  B  is  closed  under  ntAfd)  anci  rio  B .  Assume  that  there  is  a 
path  7 r  from  no  to  n\,  with  n±  G  B  but  for  all  n  G  7r  with  n  ^  n\,  n  £  B .  Then  all 
sink-bounded  paths  from  no  will  contain  n\. 


As  a  consequence  we  have  the  following  result,  giving  conditions  to  preclude  the  existence 
of  infinite  un-observable  paths: 

Lemma  3.  Assume  that  no  ^  E ,  but  that  there  is  a  path  tt  starting  at  no  which  contains 
a  node  in  E. 

—  If  £  is  closed  under  termination  insensitive  control  dependency,  then  all  sink  bounded 
paths  starting  at  no  will  reach  E . 

—  If  E  is  also  closed  under  termination  sensitive  control  dependency,  then  all  maximal 
paths  starting  at  no  will  reach  E . 

We  are  now  ready  for  the  main  result,  stating  that  from  a  given  node  there  is  a  unique 
first  observable  (for  this,  we  need  the  CFG  to  be  reducible,  as  can  be  seen  by  the  coun¬ 
terexample  where  from  no  there  are  edges  to  n\  and  712  between  which  there  is  a  cycle). 

Theorem  2.  Assume  that  no  (f  E ,  that  n\,n2  £  E,  and  that  there  are  paths  7Ti  =  [rzo  -  -  ^-l] 
and  7T2  =  [rio--n2]  such  that  on  both  paths,  all  nodes  except  the  last  do  not  belong  to  E . 

If  E  is  closed  under  termination  insensitive  control  dependency  ( a  weaker  requirement 
than  being  closed  under  termination  sensitive  control  dependency) ,  and  if  the  CFG  is 
reducible,  then  n\  =  n2- 

Proof.  Clearly,  we  can  extend  ~k\  and  7T2  into  sink-bounded  paths  7r(  and  tt'2-  By  Lemma  2, 
we  infer  that  tt'2  contains  n\,  and  that  7r(  contains  «2-  If  n\  ^  n2,  this  implies  that  n\  is 
reachable  from  n2,  and  vice  versa,  while  both  being  reachable  from  no,  something  which 
cannot  happen  in  a  reducible  graph. 

5  Slicing 

We  now  describe  how  to  slice  a  (reducible)  CFG  G  wrt.  a  slice  set  Sc,  the  smallest 
set  containing  C  which  is  closed  under  data  dependence  — >  and  also  under  some  kind 
of  control  dependence:  at  least  we  must  require  it  is  closed  under  ntIfd ^  bed  a  stronger 
correctness  property  (Sect.  5.1)  holds  if  it  is  also  closed  under  nt£'f 

The  result  of  slicing  is  a  program  with  the  same  CFG  as  the  original  one,  but  with 
the  code  map  code i  replaced  by  code 2-  Here  code2{n)  =  code\(n)  for  n  €  Sc',  for  n  ^  Sc 
then 

—  if  n  is  a  statement  node  then  code2(n)  is  the  statement  skip; 

—  if  n  is  a  predicate  node  then  code2(n)  is  cskip,  the  semantics  of  which  is  that  it 
non-deterministically  chooses  one  of  its  successors. 

The  above  definition  is  conceptually  simple,  so  as  to  facilitate  the  correctness  proofs. 
Of  course,  one  would  want  to  do  some  post-processing,  like  eliminating  skip  commands 
and  eliminating  cskip  commands  where  the  two  successor  nodes  are  equal;  we  shall 
not  address  this  issue  further  but  remark  that  most  such  transformations  are  trivially 
meaning  preserving. 


5.1  Correctness  Properties 

The  main  intuition  behind  our  notion  of  slicing  correctness  is  that  the  nodes  in  a  slicing 
criteria  C  represent  “observations”  that  one  is  making  about  a  CFG  G  under  considera¬ 
tion.  Specifically,  for  a  n  £  C,  one  can  observe  that  n  has  been  executed  and  also  observe 
the  values  of  any  variables  referenced  at  n.  Execution  of  nodes  not  in  C  correspond  to 
silent  moves  or  non-observable  actions.  The  slicing  transformation  should  preserve  the 
behavior  of  the  program  with  respect  to  C-observations,  but  parts  of  the  program  that 


are  irrelevant  with  respect  to  computing  C  observations  can  be  “sliced  away” .  The  slice 
set  Sc  built  according  to  Definition  4  represents  the  nodes  that  are  relevant  for  main¬ 
taining  the  observations  C.  Thus,  to  prove  the  correctness  of  slicing  we  will  establish  the 
stronger  result  that  G  will  have  the  same  Sc  observations  wrt.  the  original  code  map 
code i  as  wrt.  the  sliced  code  map  code 2,  and  this  will  imply  that  they  have  the  same  C 
observations. 

The  discussion  above  suggests  that  appropriate  notions  of  correctness  for  slicing  reac¬ 
tive  programs  can  be  derived  from  the  notion  of  weak  bisimulation  found  in  concurrency 
theory,  where  a  transition  may  include  a  number  of  r-moves  [15].  In  our  setting,  we  shall 
consider  transitions  that  do  one  or  more  steps  before  arriving  at  a  node  in  the  slice  set. 

Definition  8.  For  i  =  1,2  we  write  s  1— s'  to  denote  that  wrt.  code  map  codei,  the 
program  state  s  rewrites  in  one  step  to  s' . 

For  i  =  1,2  we  write  So  ==>  s  if  there  exists  s±  ...  Sk  (k  >  1)  with  Sk  =  s  such  that 
( with  each  Sj  =  (nj,  ctj ) ) 

—  for  all  j  £  {1 . . .  k}  we  have  Sj-i  H-f->  Sj; 

—  nk  £  Sc  but  for  all  j  £  {1 ...  k  —  1};  Uj  £  Sc- 

Definition  9.  A  binary  relation  S  on  program  states  is  a  bisimulation  if  whenever 
(si,  S2)  £  S  then 

1  2 

(a)  if  s  1  =>■  then  there  exists  a  s'2  such  that  S2  =>  s'2  and  £  S,  and 

(b)  if  S2  =>  s2  then  there  exists  a  such  that  s  1  =>  and  £  S. 

If  instead  of  (b)  we  only  have  (c)  below,  we  say  that  S  is  a  quasi-bisimulation. 

2  1  1 

(c)  if  s 2  ==>■  s2  then  either  s  1  or  there  exists  a  s[  such  that  s  1  =4>  s[  and 

(si>  s2)  e  <5- 

For  each  node  n  in  G,  we  define  relv(ri),  the  set  of  relevant  variables  at  n,  by  stipulating 
that  x  £  relv(n)  if  there  exists  a  node  rife  €  Sc  and  a  path  n  from  n  to  n k  such  that 
x  £  refs(nk),  but  x  ^  defs(nj)  for  all  nodes  nj  occurring  before  rife  in  7 r. 

The  above  is  well-defined  in  that  it  does  not  matter  whether  we  use  code  ±  or  code 2, 
as  it  is  easy  to  see  that  the  value  of  relv(n)  is  not  influenced  by  the  content  of  nodes  not 

in  Sc,  since  that  set  is  closed  under  — (Also,  the  closedness  properties  of  Sc  are  not 
affected  by  using  code 2  rather  than  code  1.) 

We  are  now  ready  to  state  the  correctness  theorem: 

Theorem  3.  Let  the  relation  So  be  given  by  (n\,  ay)  Sq  (712,(72)  iff  n\  =  712  and  ay  =re^(?ll) 
(7 2 .  Then  (if  G  is  reducible) 

—  So  is  a  quasi-bisimulation; 

—  So  is  even  a  bisimulation  if  Sc  is  closed  under  ntAfd . 

Proof.  (Sketch.)  We  must  consider  transitions  of  the  form  (n,  cr,  )  =?=>  (n',  cr();  that  is  we 
have  (n,  af)  i— f->  (n",cr")  and  either  n"  =  n!  or  ( n",a ")  (nfa'f). 

With  j  —  3  —  i,  our  general  goal  is  to  simulate  the  above  transition  wrt.  codej.  For 

three  cases,  listed  below,  we  find  a"  such  that  (n,  aj)  ( n ”,  cr")  with  cr"  —relv^nn)  & ": 
then  we  are  done  if  n"  =  n;  otherwise  we  apply  inductive  reasoning. 

n  £  Sc  Here  cr,  =rej\n>  crj.  Therefore,  if  n  is  a  predicate,  the  same  branch  will  be  taken; 
if  n  is  a  statement,  the  stores  will  be  updated  with  the  same  value. 


n  £  Sc  is  a  statement  Here  code2{n)  =  skip,  and  the  claim  follows  since  the  value 
stored  by  codei(n)  will  not  belong  to  relv(n")  (as  Sc  is  closed  under  ^>). 
n  £  Sc  is  a  predicate,  i  =  1  Then  code2{n)  =  cskip,  and  the  claim  is  trivial. 

We  are  left  with  the  interesting  case  where  n  ^  Sc  is  a  predicate,  i  =  2.  Two  subcases: 

—  Sc  is  closed  under  nt-£$d-  we  must  show  (b)  of  Definition  9.  But  Lemma  3  tells  us  that 
there  exists  ni,a[  such  that  (n,  <ti)  =k  (ni,cr(),  where  n\  =  v!  by  Theorem  2.  For 
x  €  relv(n'),  we  have  to  show  that  cr[(x)  =  cr^x),  which  follows  since  such  variables 

cannot  be  modified  along  the  way  (again  since  Sc  is  closed  under 

—  otherwise,  we  only  have  to  show  (c)  of  Definition  9,  so  assume  that  there  exists  «i,  <j\ 
such  that  (n,  cti)  ==>  (ni,<r().  By  Theorem  2  we  infer  that  rq  =  n',  and  we  proceed 
as  in  the  previous  case. 

6  Non- Termination  Sensitive  Control  Dependence 
Algorithm 

Control  dependences  are  calculated  using  a  symbolic  data-flow  analysis.  Fundamentally, 
control  dependences  are  determined  by  reasoning  about  properties  of  sets  of  CFG  paths; 
those  sets  are  represented  symbolically  in  our  algorithm.  Specifically,  for  each  node  n 
with  more  than  one  successor  in  G,  the  set  of  paths  starting  at  n  that  begin  with  n  — »  m 
is  represented  by  tnm.  The  algorithm  propogates  these  symbolic  values  to  collect  the 
effects  of  particular  control  flow  choices  at  program  points  in  the  CFG.  For  each  node  p 
in  the  CFG  a  set  of  symbolic  values,  Spn ,  is  stored  for  each  node  pin  the  CFG  that 
has  more  than  one  successor;  these  sets  record  the  set  of  paths  that  originate  from  n. 
The  algorithm  preserves  the  invariant  that  if  tnm  is  in  Spn  then  all  paths  from  n  starting 
with  n  — >  m  contain  node  p.  A  complete  description  of  the  algorithm,  its  correctness  and 
its  complexity  are  given  in  [7];  in  the  rest  of  this  section  we  provide  an  overview  of  its 
main  processing  steps  and  its  adaptation  to  computing  other  forms  of  dependence. 

Let  Tn  denote  the  outdegree  of  n  and  condNodes(G)  denote  the  set  of  nodes  with 
outdegree  greater  than  one. 

The  algorithm  is  initialized  such  that,  for  each  node  n  £  condNodes(G ),  tnm  is  in¬ 
serted  into  Smn  for  each  successor  m  of  n  and  m  is  marked  for  processing.  The  algorithm 
then  proceeds  by  executing  the  following  three  steps  for  each  marked  node  ?n;  it  termi¬ 
nates  when  there  are  no  longer  any  marked  nodes. 

1.  For  each  node  n  S  condNodes(G)\m,  if  |Smn|  =  Tn  then,  for  each  node  p  £ 
condNodes(G)\m,  all  symbols  from  Snni  are  inserted  into  Smp.  This  captures  the 
property  that  if  all  non-terminal  paths  or  terminal  paths  that  end  in  exit  nodes  from 
every  successor  of  n  contains  m,  then  these  paths  will  also  contain  p. 

2.  Depending  on  the  number  of  successors  of  m ,  one  of  the  following  actions  is  performed 
if  any  Snq  was  changed. 

|sitcc(m)|  =  1  Let  p  be  the  successor  of  m.  For  each  node  n  such  that  Spn\Smn  0, 
insert  Smn  into  Spn  and  add  p  ito  the  worklist.  This  captures  the  property  that 
all  non-terminal  paths  or  terminal  paths  that  end  in  exit  nodes  that  contain  m 
will  also  contain  n. 

|skcc(to)|  >  1  For  each  node  n,  if  |5„m|  =  Tm  then  n  is  marked  for  processing.  This 
captures  the  requirement  that  any  path  information  change  at  m  needs  to  be 
considered  at  each  node  n  that  will  occur  on  all  non-terminal  paths  or  terminal 
paths  that  end  in  exit  nodes  starting  from  m. 

3.  Unmark  m. 


When  there  are  no  more  marked  nodes,  all-path  reachability  information  for  every 
pair  of  nodes,  n  and  to  (with  outdegree  greater  than  one),  in  the  graph  is  available 
in  Snm.  The  presence  of  a  token  tmp  in  Snm  indicates  that  all  non-terminal  paths  or 
terminal  paths  that  end  in  exit  nodes  starting  with  the  edge  to  — >  p  contain  n.  So,  if 
I •S'nm  |  >  0  A  |£nm|  ^  Tm  then,  by  Definition  6,  it  can  be  inferred  that  n  is  directly  control 
dependent  on  to.  On  the  other  hand,  if  |5nm|  >  0  and  ISViml  =  Tm  then,  by  Definition  6, 
it  can  be  inferred  that  n  is  not  directly  control  dependent  on  m. 

The  proposed  algorithms  have  a  worst-case  asymptotic  complexity  of  0(|iV|3  x  K) 
where  K  is  the  sum  of  the  outdegree  of  all  nodes  with  more  than  one  successor  in  the 
CFG.  Linear  time  algorithms  to  calculate  control  dependence  have  been  proposed  in  the 
literature  [1].  These  algorithms,  however,  rely  on  augmentation  of  the  CFG.  The  practical 
cost  of  this  augmentation  varies  with  the  specific  algorithm  and  control  dependence 
being  calculated.  Our  experience  with  an  implementation  of  our  general  algorithms  in  a 
program  sheer  for  full  Java  suggests  that,  despite  its  complexity  bound,  it  can  be  scaled 
to  programs  with  tens-of-thousands  of  lines  of  code  and  still  return  results  in  a  matter  of 
seconds.  We  suspect  that  this  is  due  in  part  to  the  elimination  of  the  need  for  augmenting 
CFGs  in  our  approach. 


7  Related  Work 

Fifteen  years  ago,  control  dependence  was  rigorously  explored  by  Podgurski  and  Clarke 
in  [1] .  Since,  then  there  has  been  a  variety  of  work  related  to  calculation  and  application 
of  control  dependence  in  the  setting  of  CFGs  that  satisfy  the  unique  end  node  property. 

In  the  realm  of  calculating  control  dependence,  Bilardi  et.al  [13]  proposed  new  con¬ 
cepts  related  to  control  dependence  along  with  algorithms  based  on  these  concepts  to 
efficiently  calculate  weak  control  dependence.  In  [16],  Johnson  proposed  an  algorithm 
that  could  be  used  to  calculate  control  dependence  in  time  linear  in  the  number  of  edges. 
In  comparison,  in  this  paper  we  sketch  a  feasible  algorithm  in  a  more  general  setting. 

In  the  context  of  slicing,  Horwitz,  Reps,  and  Binkley  [17]  presented  what  has  now  be¬ 
come  the  standard  approach  to  inter-procedural  slicing  via  dependence  graphs.  Recently, 
Allen  and  Horwitz  [18]  extended  previous  work  on  slicing  to  handle  exception-based  inter¬ 
procedural  control  flow.  In  this  work,  they  handle  CFG’s  with  two  end  nodes  (one  for 
normal  return  and  one  for  exceptional  return)  but  it  is  unclear  how  this  affects  the  control 
dependence  captured  by  dependence  graph.  In  comparison,  we  have  shown  program  slic¬ 
ing  is  feasible  with  unaugmented  CFGs,  and  the  extended  version  of  the  paper  describes 
in  greater  detail  how  our  definitions  interact  with  interprocedural  slicing. 

For  relevant  work  on  slicing  correctness,  [19],  Horwitz  et.al.  use  a  semantics  based 
multi-layered  approach  to  reason  about  the  correctness  of  slicing  in  the  realm  of  data 
dependence.  In  [9],  Ball  et.al  used  program  point  specific  history  based  approach  to 
prove  the  correctness  of  slicing  for  arbitrary  control  flow.  We  build  off  of  that  work  to 
consider  arbitrary  control  flow  with  out  the  unique  end-node  restriction.  Their  correctness 
property  is  a  weaker  property  than  bi-simulation  -  it  does  not  require  ordering  to  be 
maintained  between  observable  nodes  if  there  is  no  dependence  between  these  nodes 
-  and  it  holds  for  irreducible  CFGs.  Even  though  our  definitions  apply  to  irreducible 
graphs,  we  need  to  extra  structure  of  reducible  graphs  to  achieve  the  stronger  correctness 
property.  We  are  currently  investigating  if  we  can  establish  their  correctness  property 
using  our  control  dependence  definitions  on  irreducible  graphs. 

In  [5],  Hatcliff  et.al.  presented  notions  of  dependence  for  concurrent  CFGs,  and  pro¬ 
posed  a  notion  of  bi-simulation  as  the  correctness  property.  Millett  and  Teitelbaum  [20] 
study  static  slicing  of  Promela  (the  model  description  language  for  the  model-checker 
SPIN)  and  its  application  to  model  checking,  simulation,  and  protocol  understanding, 
but  they  do  not  formalize  a  notion  of  correct  slice  nor  do  they  discuss  issues  related  to 


preserving  non-termination  and  liveness  properties.  Krinke  [21]  considers  static  slicing 
of  multi-threaded  programs  with  shared  variables,  and  focuses  on  issues  associated  with 
inter-thread  data  dependence  but  does  not  consider  non-termination  sensitive  forms  of 
control  dependence. 

8  Conclusion 

The  notion  of  control  dependence  is  used  in  myriad  of  applications,  and  researchers  and 
tool  builders  increasing  seek  to  apply  it  to  modern  software  systems  and  high-assurance 
applications  -  even  though  the  control  flow  structure  and  semantic  behavior  of  these 
systems  does  not  mesh  well  with  the  requirements  of  existing  control  dependence  de¬ 
pendences.  In  this  paper,  we  have  proposed  conceptually  simple  definitions  of  control 
dependence  that  (a)  can  be  applied  directly  to  the  structure  of  modern  software  thus 
avoiding  unsystematic  preprocessing  transformations  that  introduce  overhead,  concep¬ 
tual  complexity,  and  sometimes  dubious  semantic  interpretations,  and  (b)  provide  a  solid 
semantic  foundation  for  applying  control  dependence  to  reactive  systems  where  program 
executions  may  be  non-terminating. 

We  have  rigorously  justified  these  definitions  by  detailed  proofs,  by  expressing  them 
in  temporal  logic  which  provides  an  unambiguous  definition  and  allows  them  to  be  me¬ 
chanically  checked/debugged  against  examples  using  automated  verification  tools,  by 
showing  their  relationship  to  existing  definitions,  and  by  implementing  and  experiment¬ 
ing  with  them  in  a  publicly  available  slicer  for  full  Java.  In  addition,  we  have  provided 
algorithms  for  computing  these  new  control  dependence  relations,  and  argued  that  any 
additional  cost  in  computing  these  relations  is  negligible  when  one  considers  the  cost  and 
ill-effects  of  preprocessing  steps  required  for  previous  definitions.  Thus,  we  believe  that 
there  are  many  benefits  for  widely  applying  these  definitions  in  static  analysis  tools. 

In  ongoing  work,  we  continue  to  explore  the  foundations  of  static  and  dynamically 
calculating  dependences  for  concurrent  Java  programs  for  slicing,  program  verification, 
and  security  applications.  In  particular,  we  are  exploring  the  relationship  between  de¬ 
pendences  extracted  from  execution  traces  and  dependences  extracted  from  control-flow 
graphs  in  an  effort  to  systematically  a  justify  a  comprehensive  set  of  dependence  notions 
for  the  rich  features  found  in  concurrent  Java  programs. 
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A  Details 

A.l  Algorithm  to  calculate  Non- Termination  Sensitive  Control 
Dependence 

Proof  of  correctness  We  show  that  phase  (1)  and  (2)  of  the  algorithm  are  correct  by 
proving  that  the  following  loop  invariant  holds  for  the  outer  loop  in  each  phase. 

tniH2  G  Sn3m  implies  each  non-trivial  loop-free  segment  n\?)  of  each  maxi¬ 
mal  path  7 r  =  [ri2..ni?]  contains  n 3. 

At  the  beginning  of  phase  (1),  each  token  set  Tn3Hl  is  empty.  Hence,  the  invariant 
is  trivially  established.  In  the  loops  at  line  9  and  10,  for  each  immediate  successor  node 
«2  of  each  conditional  node  n±,  tni7l2  is  injected  into  Tn2ni.  This  trivially  preserves  the 
invariant  at  the  end  of  the  loop  as  713  =  712  occurs  on  all  segments  starting  712-  The  loop 
will  terminate  as  the  number  of  nodes  in  the  graph  is  finite. 

Now  the  reasoning  about  phase  (2). 


Non-Termination-Sensitive-Control-Dependence(G) 

1  G(N ,  E,  no,  NE)  :  a  control  flow  graph. 

2  S[|JV|,  \N\]  :  a  matrix  of  sets  where  S[ni,  772]  represents  Snin2 

3  T[|JV|]  :  a  sequence  of  integers  where  T[n i]  denotes  Tni. 

4  CZ)[|JV|]  :  a  sequence  of  sets. 

5  workbag  :  a  set  of  nodes. 

6 

7  #  (1)  Initialize 

8  workbag  *—  0 

9  for  each  m  in  condNodes(G) 

10  do  succs  =  succs(ni,  G ) 

11  for  each  ?i2  in  succs 

12  do  workbag  *—  workbag  U  {/12} 

13  5'[n2,  ni]  < 
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#  (2)  Calculate  all-path  reachability 
while  workbag  7!  0 

do  flag  <—  false 

no  <—  remove(workbag) 

for  each  m  in  condNodes(G)\no 

do  if  |S[ri3,  m]|  =  T[m] 

then  for  each  774  in  condNodes(G)\no 
do  if  S[ni,  ri4]\S'[n3,  774]  7^  0 

then  S[n3,  774]  <—  5[n3,  774]  U  S'fni,  774] 
flag  =  true 

if  flag  and  |smccs(t73,  G)|  =  1 
then  775  <—  select  {succs  (no,  G)) 
for  714  in  condNodes(G) 
do  if  S[n5,  n4]\S'[n3, 774]  7^  0 

then  £[775,  714]  <—  £[775,  714]  U  £[773, 714] 
workbag  <—  workbag  U  {775} 
else  if  flag  and  \succs(n3,  G)|  >  1 
then  for  each  774  in  N 

do  if  |S[r74,  773]!  =  T[no\ 

then  workbag  <—  workbag  U  {774} 

#  (3)  Calculate  non-termination  sensitive  control  dependence 
for  each  773  in  N 

do  for  each  771  in  condNodes(G) 

do  if  |S'[t74,  773]!  >0  and  |£[773,  77i]|  7^  T[t7i] 

then  G_D[t73]  <—  CD[no\  U  {771} 


43  return  CD 


Fig.  2.  The  algorithm  to  calculate  non-termination  sensitive  control  dependence. 


Initialization  At  the  beginning  of  phase  (2),  the  invariant  holds  as  it  held  at  the  ter¬ 
mination  of  phase  (1). 

Maintenance  The  loops  at  line  19-23  and  at  line  28-30  ensure  that  if  all  non-trivial 
loop- free  (rii,  ni?)  segments  from  n\  contains  n 3  then  all  non-trivial  loop- free  (714, 74?) 
segments  that  contain  ri\  will  contain  713. 3  Hence,  the  loop  invariant  is  maintained. 
Termination  In  each  iteration,  either  the  size  of  a  token  set  increases  at  least  by  one 
or  remains  the  same.  Hence,  eventually  the  size  of  the  token  sets  will  stabilize  (not 
increase)  preventing  additions  of  elements  to  the  workbag  at  lines  31  and  35  (by  not 
setting  flag  to  true  in  the  conditional  at  22).  Hence,  the  loop  at  line  16  will  terminate. 
Upon  termination,  as  the  size  of  token  sets  remain  unchanged,  it  should  be  the  case 
that  each  set  reached  it’s  maximal  size. 

Hence,  after  phase  (1)  and  (2),  for  each  successor  712  of  74  £  condNodes(G),  tnin2  £ 
Sn3n  1  only  if  each  non-trivial  loop-free  segment  77(712,711?)  of  each  maximal  path  7 r  = 
[712  ••  Tii?]  contains  713. 

In  phase  (3),  direct  control  dependence  is  calculated  based  on  the  available  reachabil¬ 
ity  information.  The  termination  of  this  phase  is  obvious  by  the  finiteness  of  the  nodes 
and  edges  of  the  graph. 

Complexity  analysis  In  phase  (1),  for  every  node  with  multiple  successors  in  the  CFG, 
each  of  its  successors  is  processed.  Hence,  it  leads  to  a  worst-case  asymptotic  complexity 
of  0(\E\)  for  phase  (1).  In  phase  (3),  for  each  node,  every  node  in  the  CFG  is  processed 
leading  to  a  worst-case  asymptotic  complexity  of  0(|A|2)  for  this  phase. 

In  phase  (2),  the  loop  at  line  16  iterates  till  the  size  of  the  token  sets  represented 
by  S  stabilizes.  The  maximum  size  of  a  token  set  S\n\,n2\  is  given  by  T[ 712]  which  is 
equal  to  the  outdegree  of  712-  In  each  iteration,  either  the  size  of  a  token  set  increases  at 
least  by  one  or  remains  the  same.  In  the  former  case,  it  contributes  an  iteration.  As  the 
size  of  the  token  sets  S[rii,  712]  is  bound,  all  token  sets  of  5[rn]  will  stabilize  in  a  total  of 
J2T[i]  or  less  iterations.  The  loops  in  line  19  and  21  contribute  0{\condNodes{G)\2)  ~ 
0(|A|2)  to  each  such  iteration.  Hence,  the  worst-case  complexity  of  phase  (2)  will  be 
0(1  A|3  x  JfT[i]  x  lg(|A|))  by  factoring  in  the  complexity  0(lg  \N\)  of  set  operations. 

By  combining  the  above  information,  the  worst-case  complexity  to  calculate  indirect 
control  dependence  via  phase  1,  2,  and  3  will  be  0(\E\  +  |A|3  x  ^T[i]  x  lg  |A|  +  |A|2). 
However,  as  0(|iV|3  x  x  lg  \N\)  dominates  0(|A|2)  and  0(\E\),  the  complexity  will 

be  0(| A]3  x  'f2T[i]  x  lg|Aj  when  ^T[i]  x  lg|A|  >  1.  It  will  be  0(|A|2  +  |A|))  when 

£T[i]  =  0. 

As  in  practice  \condNodes(G)\2  «  \N\,  the  complexity  in  the  case  where  T[z]  x 
lg|A|  >  1  will  reduce  to  0(|A|2  x  ]T)T[z]  x  lg|A|). 


3  Suppose  there  is  a  non-trivial  loop-free  (714,774?)  segment  that  contains  ni  but  not  713.  This 
implies  there  is  a  non-trivial  loop- free  segment  from  m  to  714  to  771  that  does  not  contain  773, 
hence,  leading  to  a  contradiction. 


